474 research outputs found

    Expanded Phenotypic Diagnoses for 24 Recently Named New Taxa of Hesperiidae (Lepidoptera)

    Get PDF
    Expanded diagnoses by phenotypic characters for the 24 new taxa named in the article Genomes of skipper butterflies reveal extensive convergence of wing patterns by Li, W., Cong, Q., Shen, J., Zhang, J., Hallwachs, W., Janzen, D.H. and Grishin, N.V., 2019 and published in the Proceedings of the National Academy of Sciences of the United States of America on March 15, 2019 are provided and illustrated. More detailed diagnoses will help identifying these phylogenetic groups by their wing patterns and shapes and other morphological characters including the structures of antennae and genitalia using this single publication, instead of obtaining the sequences or inspecting additional works referenced in the original diagnoses for brevity

    Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

    Get PDF
    BACKGROUND: Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families. RESULTS: For problems (1) and (2), we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a) We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b) We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c) We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d) We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. CONCLUSION: The proposed computational method is of significant potential value for the analysis of protein families

    Exploring dynamics of protein structure determination and homology-based prediction to estimate the number of superfamilies and folds

    Get PDF
    BACKGROUND: As tertiary structure is currently available only for a fraction of known protein families, it is important to assess what parts of sequence space have been structurally characterized. We consider protein domains whose structure can be predicted by sequence similarity to proteins with solved structure and address the following questions. Do these domains represent an unbiased random sample of all sequence families? Do targets solved by structural genomic initiatives (SGI) provide such a sample? What are approximate total numbers of structure-based superfamilies and folds among soluble globular domains? RESULTS: To make these assessments, we combine two approaches: (i) sequence analysis and homology-based structure prediction for proteins from complete genomes; and (ii) monitoring dynamics of the assigned structure set in time, with the accumulation of experimentally solved structures. In the Clusters of Orthologous Groups (COG) database, we map the growing population of structurally characterized domain families onto the network of sequence-based connections between domains. This mapping reveals a systematic bias suggesting that target families for structure determination tend to be located in highly populated areas of sequence space. In contrast, the subset of domains whose structure is initially inferred by SGI is similar to a random sample from the whole population. To accommodate for the observed bias, we propose a new non-parametric approach to the estimation of the total numbers of structural superfamilies and folds, which does not rely on a specific model of the sampling process. Based on dynamics of robust distribution-based parameters in the growing set of structure predictions, we estimate the total numbers of superfamilies and folds among soluble globular proteins in the COG database. CONCLUSION: The set of currently solved protein structures allows for structure prediction in approximately a third of sequence-based domain families. The choice of targets for structure determination is biased towards domains with many sequence-based homologs. The growing SGI output in the future should further contribute to the reduction of this bias. The total number of structural superfamilies and folds in the COG database are estimated as ~4000 and ~1700. These numbers are respectively four and three times higher than the numbers of superfamilies and folds that can currently be assigned to COG proteins

    Thirteen new species of butterflies (Lepidoptera: Hesperiidae) from Texas

    Get PDF
    Analyses of whole genomic shotgun datasets, COI barcodes, morphology, and historical literature suggest that the following 13 butterfly species from the family Hesperiidae (Lepidoptera: Papilionoidea) in Texas, USA are distinct from their closest named relatives and therefore are described as new (type localities are given in parenthesis): Spicauda atelis Grishin, new species (Hidalgo Co., Mission), Urbanus (Urbanus) rickardi Grishin, new species (Hidalgo Co., nr. Madero), Urbanus (Urbanus) oplerorum Grishin, new spe­cies (Hidalgo Co., Mission/Madero), Telegonus tsongae Grishin, new species (Starr Co., Roma), Autochton caballo Grishin, new species (Hidalgo Co., 6 mi W of Hidalgo), Epargyreus fractigutta Grishin, new species (Hidalgo Co., McAllen), Aguna mcguirei Grishin, new species (Cameron Co., Brownsville), Polygonus par­dus Grishin, new species (Hidalgo Co., McAllen), Arteurotia artistella Grishin, new species (Hidalgo Co., Mission), Heliopetes elonmuski Grishin, new species (Cameron Co., Boca Chica), Hesperia balcones Grishin, new species (Travis Co., Volente), Troyus fabulosus Grishin, new species (Hidalgo Co., Peñitas), and Le­rema ochrius Grishin, new species (Hidalgo Co., nr. Relampago). Most of these species are known in the US almost exclusively from the Lower Rio Grande Valley in Texas. Nine of the holotypes were collected in 1971-1975, a banner period for butterfly species newly recorded from the Rio Grande Valley of Texas; five of them collected by William W. McGuire, and one by Nadine M. McGuire. At the time, these new species have been recorded under the names of their close relatives. A Neotype is designated for Papilio fulminator Sepp, [1841] (Suriname). Lectotypes are designated for Goniurus teleus Hübner, 1821 (unknown, likely in South America), Goniloba azul Reakirt, [1867] (Mexico: Veracruz) and Eudamus misitra Plötz, 1881 (Mex­ico). Several taxonomic changes are proposed. The following taxa are species (not subspecies): Spicauda zalanthus (Plötz, 1880), reinstated status (not Spicauda teleus (Hübner, 1821)), Telegonus fulminator (Sepp, [1841]), reinstated status (not Telegonus fulgerator (Walch, 1775), Telegonus misitra (Plötz, 1881), reinstated status (not Telegonus azul (Reakirt, [1867])), Autochton reducta (Mabille and Boullet, 1919), new status (not Autochton potrillo (Lucas, 1857)), Epargyreus gaumeri Godman and Salvin, 1893, reinstated status (not Epargyreus clavicornis (Herrich-Schäffer, 1869)), and Polygonus punctus E. Bell and W. Comstock, 1948, new status (not Polygonus savigny (Latreille, [1824])). Urbanus ehakernae Burns, 2014 and Epargyreus socus chota Evans, 1952 are junior subjective synonyms of Urbanus alva Evans, 1952 and Epargyreus clavicornis (Herrich-Schäffer, 1869), respectively, and Epargyreus gaumeri tenda Evans, 1955, new combination is not a subspecies of E. clavicornis

    Reconstruction of ancestral protein sequences and its applications

    Get PDF
    BACKGROUND: Modern-day proteins were selected during long evolutionary history as descendants of ancient life forms. In silico reconstruction of such ancestral protein sequences facilitates our understanding of evolutionary processes, protein classification and biological function. Additionally, reconstructed ancestral protein sequences could serve to fill in sequence space thus aiding remote homology inference. RESULTS: We developed ANCESCON, a package for distance-based phylogenetic inference and reconstruction of ancestral protein sequences that takes into account the observed variation of evolutionary rates between positions that more precisely describes the evolution of protein families. To improve the accuracy of evolutionary distance estimation and ancestral sequence reconstruction, two approaches are proposed to estimate position-specific evolutionary rates. Comparisons show that at large evolutionary distances our method gives more accurate ancestral sequence reconstruction than PAML, PHYLIP and PAUP*. We apply the reconstructed ancestral sequences to homology inference and functional site prediction. We show that the usage of hypothetical ancestors together with the present day sequences improves profile-based sequence similarity searches; and that ancestral sequence reconstruction methods can be used to predict positions with functional specificity. CONCLUSIONS: As a computational tool to reconstruct ancestral protein sequences from a given multiple sequence alignment, ANCESCON shows high accuracy in tests and helps detection of remote homologs and prediction of functional sites. ANCESCON is freely available for non-commercial use. Pre-compiled versions for several platforms can be downloaded from

    PALSSE: A program to delineate linear secondary structural elements from protein structures

    Get PDF
    BACKGROUND: The majority of residues in protein structures are involved in the formation of α-helices and β-strands. These distinctive secondary structure patterns can be used to represent a protein for visual inspection and in vector-based protein structure comparison. Success of such structural comparison methods depends crucially on the accurate identification and delineation of secondary structure elements. RESULTS: We have developed a method PALSSE (Predictive Assignment of Linear Secondary Structure Elements) that delineates secondary structure elements (SSEs) from protein C(α )coordinates and specifically addresses the requirements of vector-based protein similarity searches. Our program identifies two types of secondary structures: helix and β-strand, typically those that can be well approximated by vectors. In contrast to traditional secondary structure algorithms, which identify a secondary structure state for every residue in a protein chain, our program attributes residues to linear SSEs. Consecutive elements may overlap, thus allowing residues located at the overlapping region to have more than one secondary structure type. CONCLUSION: PALSSE is predictive in nature and can assign about 80% of the protein chain to SSEs as compared to 53% by DSSP and 57% by P-SEA. Such a generous assignment ensures almost every residue is part of an element and is used in structural comparisons. Our results are in agreement with human judgment and DSSP. The method is robust to coordinate errors and can be used to define SSEs even in poorly refined and low-resolution structures. The program and results are available at

    A tale of two ferredoxins: sequence similarity and structural differences

    Get PDF
    BACKGROUND: Sequence similarity between proteins is usually considered a reliable indicator of homology. Pyruvate-ferredoxin oxidoreductase and quinol-fumarate reductase contain ferredoxin domains that bind [Fe-S] clusters and are involved in electron transport. Profile-based methods for sequence comparison, such as PSI-BLAST and HMMer, suggest statistically significant similarity between these domains. RESULTS: The sequence similarity between these ferredoxin domains resides in the area of the [Fe-S] cluster-binding sites. Although overall folds of these ferredoxins bear no obvious similarity, the regions of sequence similarity display a remarkable local structural similarity. These short regions with pronounced sequence motifs are incorporated in completely different structural environments. In pyruvate-ferredoxin oxidoreductase (bacterial ferredoxin), the hydrophobic core of the domain is completed by two β-hairpins, whereas in quinol-fumarate reductase (α-helical ferredoxin), the cluster-binding motifs are part of a larger all-α-helical globin-like fold core. CONCLUSION: Functionally meaningful sequence similarity may sometimes be reflected only in local structural similarity, but not in global fold similarity. If detected and used naively, such similarities may lead to incorrect fold predictions
    • …
    corecore